AITopics | observation window

Collaborating Authors

observation window

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SnapKV: LLM Knows What You Are Looking for before Generation Y uhong Li1 Yingbing Huang 1 Bowen Y ang 2 Bharat V enkitesh

Neural Information Processing SystemsFeb-9-2026, 21:30:05 GMT

To validate the robustness, we design extensive experiments across diverse prompts in terms of length, format, and content.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

G-KV: Decoding-Time KV Cache Eviction with Global Attention

Liao, Mengqi, Wang, Lu, Zhang, Chaoyun, Shen, Zekai, Mao, Xiaowei, Qin, Si, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Wan, Huaiyu

arXiv.org Artificial IntelligenceDec-2-2025

Recent reasoning large language models (LLMs) excel in complex tasks but encounter significant computational and memory challenges due to long sequence lengths. KV cache compression has emerged as an effective approach to greatly enhance the efficiency of reasoning. However, existing methods often focus on prompt compression or token eviction with local attention score, overlooking the long-term importance of tokens. We propose G-KV, a KV cache eviction method that employs a global scoring mechanism, combining local and historical attention scores to more accurately assess token importance. Additionally, we introduce post-training techniques, including reinforcement learning and distillation, to optimize models for compressed KV cache settings. The code of this paper is available on: https://github.com/microsoft/G-KV. Large language models (LLMs) have garnered widespread attention and applications. Recently released reasoning models have demonstrated remarkable performance (Guo et al., 2025; Team et al., 2025; Y ang et al., 2025), even in addressing complex tasks such as mathematics and coding. These reasoning models achieve significant improvements across various problems through long chain-of-thought (CoT) (Wei et al., 2022), enabling iterative reflection and verification. However, the long CoT of reasoning models typically consists of thousands or even tens of thousands of tokens. This imposes a substantial increase in computational costs and KV cache memory consumption. Notably, the computation of attention becomes a critical bottleneck, as its complexity scales quadratically with the sequence length. To overcome the bottlenecks of memory and computational complexity, numerous optimization methods for KV cache or attention mechanisms have been proposed (Li et al., 2024a). Among these, some methods prune the KV cache of tokens, significantly reducing computational overhead and memory consumption.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.00504

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

PULSE-ICU: A Pretrained Unified Long-Sequence Encoder for Multi-task Prediction in Intensive Care Units

Jang, Sejeong, Yoon, Joo Heung, Lee, Hyo Kyung

arXiv.org Artificial IntelligenceDec-1-2025

Intensive care unit (ICU) data are highly irregular, heterogeneous, and temporally fragmented, posing challenges for generalizable clinical prediction. We present PULSE-ICU, a self-supervised foundation model that learns event-level ICU representations from large-scale EHR sequences without resampling or manual feature engineering. A unified embedding module encodes event identity, continuous values, units, and temporal attributes, while a Longformer-based encoder enables efficient modeling of long trajectories. PULSE-ICU was fine-tuned across 18 prediction tasks, including mortality, intervention forecasting, and phenotype identification, achieving strong performance across task types. External validation on eICU, HiRID, and P12 showed substantial improvements with minimal fine-tuning, demonstrating robustness to domain shift and variable constraints. These findings suggest that foundation-style modeling can improve data efficiency and adaptability, providing a scalable framework for ICU decision support across diverse clinical environments.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.22199

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Maryland > Montgomery County > Rockville (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Nephrology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query

Wang, Yixuan, Ji, Shiyu, Liu, Yijun, Xu, Yuzhuang, Xu, Yang, Zhu, Qingfu, Che, Wanxiang

arXiv.org Artificial IntelligenceNov-18-2025

Large language models (LLMs) rely on key-value cache (KV cache) to accelerate decoding by reducing redundant computations. However, the KV cache memory usage grows substantially with longer text sequences, posing challenges for efficient deployment. Existing KV cache eviction methods prune tokens using prefilling-stage attention scores, causing inconsistency with actual inference queries, especially under tight memory budgets. In this paper, we propose Lookahead Q-Cache (LAQ), a novel eviction framework that generates low-cost pseudo lookahead queries to better approximate the true decoding-stage queries. By using these lookahead queries as the observation window for importance estimation, LAQ achieves more consistent and accurate KV cache eviction aligned with real inference scenarios. Experimental results on LongBench and Needle-in-a-Haystack benchmarks show that LAQ outperforms existing methods across various budget levels, achieving a 1 $\sim$ 4 point improvement on LongBench under limited cache budget. Moreover, LAQ is complementary to existing approaches and can be flexibly combined to yield further improvements.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.20334

Country:

Asia (0.46)
North America (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning-based Radio Link Failure Prediction Based on Measurement Dataset in Railway Environments

Chou, Po-Heng, Lin, Da-Chih, Wei, Hung-Yu, Saad, Walid, Tsao, Yu

arXiv.org Artificial IntelligenceNov-14-2025

In this paper, a measurement-driven framework is proposed for early radio link failure (RLF) prediction in 5G non-standalone (NSA) railway environments. Using 10 Hz metro-train traces with serving and neighbor-cell indicators, we benchmark six models, namely CNN, LSTM, XGBoost, Anomaly Transformer, PatchTST, and TimesNet, under varied observation windows and prediction horizons. When the observation window is three seconds, TimesNet attains the highest F1 score with a three-second prediction horizon, while CNN provides a favorable accuracy-latency tradeoff with a two-second horizon, enabling proactive actions such as redundancy and adaptive handovers. The results indicate that deep temporal models can anticipate reliability degradations several seconds in advance using lightweight features available on commercial devices, offering a practical path to early-warning control in 5G-based railway systems.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2511.08851

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > Canada > Quebec > Montreal (0.05)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Transportation > Ground > Rail (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Limits of Generative Pre-Training in Structured EMR Trajectories with Irregular Sampling

Kuo, Nicholas I-Hsien, Gallego, Blanca, Jorm, Louisa

arXiv.org Artificial IntelligenceOct-28-2025

Foundation models refer to architectures trained on vast datasets using autoregressive pre-training from natural language processing to capture intricate patterns and motifs. They were originally developed to transfer such learned knowledge to downstream predictive tasks. Recently, however, some studies repurpose these learned representations for phenotype discovery without rigorous validation, risking superficially realistic but clinically incoherent embeddings. To test this mismatch, we trained two autoregressive models -- a sequence-to-sequence LSTM and a reduced Transformer -- on longitudinal ART for HIV and Acute Hypotension datasets. Controlled irregularity was added during training via random inter-visit gaps, while test sequences stayed complete. Patient-trajectory synthesis evaluated distributional and correlational fidelity. Both reproduced feature distributions but failed to preserve cross-feature structure -- showing that generative pre-training yields local realism but limited clinical coherence. These results highlight the need for domain-specific evaluation and support trajectory synthesis as a practical probe before fine-tuning or deployment.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2510.22878

Country: Oceania > Australia > New South Wales > Sydney (0.05)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning

Zhang, Haoyue, Zhang, Hualei, Ma, Xiaosong, Zhang, Jie, Guo, Song

arXiv.org Artificial IntelligenceOct-16-2025

Large Language Models (LLMs) exhibit enhanced capabilities by Chain-of-Thought reasoning. However, the extended reasoning sequences introduce significant GPU memory overhead due to increased key-value (KV) cache. Existing KV cache compression methods mitigate memory bottlenecks but struggle in long reasoning tasks. In this paper, we analyze attention patterns in reasoning tasks and reveal a Token Importance Recurrence phenomenon: a large proportion of tokens regain high attention after multiple decoding steps, which is failed to capture by existing works and may lead to unpredictable eviction on such periodically critical tokens. To address this, we propose LazyEviction, an observation window-based lagged eviction framework retaining latent recurring tokens by prioritized eviction based on tokens' recurrence patterns. Extensive experiments demonstrate that LazyEviction reduces KV cache by 50%~70% while maintaining comparable accuracy, outperforming existing KV cache compression baselines. Our implementation code can be found at https://github.com/Halo-949/LazyEviction.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.15969

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

SnapKV: LLM Knows What You Are Looking for before Generation Y uhong Li1 Yingbing Huang 1 Bowen Y ang 2 Bharat V enkitesh

Neural Information Processing SystemsOct-9-2025, 21:32:52 GMT

To validate the robustness, we design extensive experiments across diverse prompts in terms of length, format, and content.

arxiv preprint arxiv, experiment, snapkv, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

From TLinFormer to TConstFormer: The Leap to Constant-Time Transformer Attention: Achieving O(1) Computation and O(1) KV Cache during Autoregressive Inference

Tang, Zhongpan

arXiv.org Artificial IntelligenceSep-3-2025

Although the Transformer has become the cornerstone of modern AI, its autoregressive inference suffers from a linearly growing KV Cache and a computational complexity of O(N^2 d), severely hindering its ability to process ultra-long sequences. To overcome this limitation, this paper introduces the TConstFormer architecture, building upon our previous work, TLinFormer. TConstFormer employs an innovative periodic state update mechanism to achieve a truly constant-size O(1) KV Cache. The computational complexity of this mechanism is also O(1) in an amortized sense: it performs purely constant-time computations for $k-1$ consecutive steps (e.g., $k=256$) and executes a single linear-time global information synchronization only on the $k$-th step. Theoretical calculations and experimental results demonstrate that TConstFormer exhibits an overwhelming advantage over baseline models in terms of speed, memory efficiency, and overall performance on long-text inference tasks. This breakthrough paves the way for efficient and robust streaming language model applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.00202

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback